Possible Data Inputs

This chapter describes data inputs. Topics include:

Types of data

CUBE Analyst can operate using some or all of the following types of data:

Note: CUBE Analyst requires confidence level information for all data types except routing information and cost distribution function.

Link counts

For highways, this information may be surveyed with considerable accuracy and exploit automatic counters, but it may not show the current demand for travel (which the O-D matrix should represent) if congestion has constricted flows.

For public transport, this data is often obtained from estimates of passenger numbers in buses and rail carriages, and is of inherently limited accuracy (but may still be usefully exploited by CUBE Analyst).

For both modes, it should be observed that matrices normally apply to average situations for which individual counts will match to only some extent.

Link counts which are spread randomly across the network contribute relatively little information to the estimation of matrix cells. This may be less of a problem for public transport networks offering limited alternative routes, than for highway networks with inherently greater route choice options.

Turning counts

The same comments as for link counts apply. Note that turning counts may only be applied when inputting a CUBE Voyager path file. They are not supported for an estimation using a TRIPS RCP file.

Prior trip matrix

This matrix might be an out-of-date matrix for the study area, or possibly a previous study forecast for the present day. It is not essential to input a prior trip matrix, but in practice a matrix is very desirable for information about the pattern of trip movements.

Trip cost matrix

This matrix summarizes the cost of travel between zones, where cost is normally defined as a user-specified combination of time and distance, and any tolls or fares, etc. The trip cost matrix may be used as a substitute when some or all of a prior matrix is not available. The costs may be based on either modelled or surveyed speed data.

Partial O-D matrix

This is simply another approach to providing the prior matrix that makes it possible to use information that specifies some cells of the matrix but not all. The user merely identifies a (relatively) high confidence in those cells which have been observed and allows other information to determine values in the remaining cells. This may be data from the cost matrix, in which case the corresponding prior matrix cells must be zero. Alternatively, non-observed cells are given non-zero values with zero or low confidence levels. Zero values in input matrices are taken to indicate that trips in corresponding cells are impossible. Cost data are not used to estimate trips for cells which have non-zero prior-trip values.

This approach makes CUBE Analyst useful when surveys have been conducted around critical parts of a study area (for example, town centers, travel corridors, etc.), but there remains a need to estimate the matrix for the rest of the area.

Trip ends

The total number of trips generated from and attracted to zones (G&A) may be obtained either from surveys or from mathematical land-use type models. Surveys are appropriate when zone boundaries are such that traffic may be counted entering and leaving zones on distinct trips, rather than merely passing through the zone. This tends to occur only for some zones, for example a car park or an industrial estate, but these are often important zones for a study.

It is possible to use data derived from both methods, for example, a few zones surveyed and the remainder derived from a model, with the resulting trip ends distinguished through differing confidence levels.

Routing information

It is possible to survey routing data, though this is rarely done. The modelling of routing is often not a very good replication of actual (erratic) driver or passenger routing, and it is often not possible to place much reliability on this otherwise important data. CUBE Analyst is therefore designed to use routing information, as far as possible, only where the precise routing does not matter. Thus, for skim cost information small variations in routes may be ignored, while count information is used in bottleneck situations where the number of routes is limited to a few alternative links (ideally one).

Cost distribution function

Many areas which have been the subject of previous studies will have a previously calibrated mathematical trip-cost distribution function, as used in the gravity model. Because CUBE Analyst contains its own calibration procedures, the information implied by the distribution function is not normally used directly, although the a and b parameters, discussed later, may be fixed with reference to a previously calibrated gravity model.

Part-trip data

This data is surveyed in the form of matrices where the recorded origin and destination are not necessarily the ultimate origin and destination of the trip. This is illustrated in the figure Definition of part-trip data that shows the recorded part of trip (S - E) relative to the total trip (O - D). It is possible for one or both of points S and E to coincide with the corresponding points O and D. For highways, this data is typically obtained from licence plate matching surveys, and from on-board surveys recording passenger boarding and alighting points for public transport.

Definition of part-trip data

Sets of data

CUBE Analyst estimates one matrix at a time, and the data should form a set related to this particular matrix, that is, the data should correspond to the same time period (hour(s) of day, day of week, time of year) as the matrix. It should also correspond to the same units of flow (vehicles, pcus, passengers, etc.). Sometimes the user will have to transform data (for example, by factoring) to achieve this, and this will usually imply a reduction (small or large) in confidence levels for the transformed data.

Also, only one set of information may be input into CUBE Analyst for an estimation. Hence, if multiple sets exist, say, several traffic counts for the same link, then the user must derive a single set. This may simply be to choose the most recently surveyed set, or it might be a weighted average of all available sets. Multiple sets of data usually allow confidence levels to be increased relative to single sets of data.